03:10
2026-05-20
wanglun1996.github.io
large-language-models
Evals Will Break and You Won't See It Coming
Current evaluation methods for large language models (LLMs) are fundamentally reactive and fail to anticipate qualitative shifts in capabilities, such as emergent abilities or strategic information wiโฆ